This document contains maps and tables pertaining to the (cleaned) Mopeia trial data. Its purpose is to explain the methods used for buffering, as well as provide visualizations and tables useful for the designation of spray / no-spray zones and study participant selection.
It was produced the morning (GMT +2) of Tuesday, November 1st, 2016. It uses data sent from Charfudin Sacoor to Joe Brew on the evening of Sunday, October 30th, 2016 (Casos problematticos resolvidos_Edgar Jamisse_Mopeia_2016_28_10.xlsx).
In addition to the manual cleaning carried out by the demography team, 2594 houses were flagged and removed in algorithmic cleaning. Houses were removed when suspected of being misclassified into the wrong village. The criteria for being of “suspicion” is somewhat complicated, but can be visualized in full here: https://github.com/joebrew/zambezia/blob/master/lib/helpers.R
The purpose of buffering is to flag areas which are too close to other villages to be suitable (due to the possibility of contamination). Per the protocol, there should be a 2 kilometer buffer between villages. In other words, any villager living within 1 kilometer of the edge of his or her village is part of the buffer zone.
However, taking the above approach is too restrictive and it results in too great of data loss. There are many cases in which a villager may live within 1 kilometer of his or her village’s edge (ie, in the “buffer”), but is still many kilometers from any other villager from another village. In these cases, there is no possibility of “contamination”, so flagging that villager as in the “buffer” doesn’t make sense.
In order to account for this issue, we construct delaunay triangles and subsequently create voronoi polygons from all villagers location. In short, this has the effect of “expanding” each village’s boundary so that it encompasses not only those points in which the villagers live, but also any point which is closer to that village than any other.
Take village 7169, for example (‘Mugurrumba’). In the below map, red points are villagers from Mugurrumba, whereas black points are residents of other villages.
If we draw a “precise” literal border around Mugurrumba, it looks like this:
Note how almost all of the residents of Mugurrumba live very close (or directly on) the border
Mugurrumba is so small that it is impossible to create an interior 1 kilometer buffer. In other words, ALL Mugurrumba residents would be considered part of the buffer.
So, what we can do is expand Mugurrumba’s “border” further out so that it includes any geospatial point which is closer to Mugurrumba than any other village. This is called “voronoi tesselation”, and is commonly used in geospatial applications.
Now, with our “expanded” (voronoi tesselation tile) border, we can then re-draw our 1 kilometer interior border:
As can be seen in the above map, some villagers (red points outside of the red zone) are in the buffer, but many villagers (those that are not near to other villages) are not.
When we apply voronoi tesselation to the entire province of Mopeia, we de facto classify every inch of territory as part of a village, even if nobody lives there. In other words, any part of Mopeia is considered part of the village which is nearest. The below map shows the entirety of the voronoi surface:
We can apply our veronoi internal buffers and we get the following:
And finally, we can add the location of each household to see which households fall into buffer zones and which don’t.
Using this method, approximately half of our households are not within buffer zones.
The above approach, though effective at avoiding contamination, unnecessarily delineates “borders” at the village-level, even though what we’re most concerned about is contamination between spray and no-spray zones. In other words, a border and buffer between two villages of identical spray status is completely unnecessary and leads to an avoidable reduction in our number of eligible study participants.
To address this, we again use voronoi tesselation. However, instead of defining regions by village, we define only two regions: “spray” and “no-spray”. Each region is multi-polygonal. Borders are defined only in the intersection of the regions, and buffers are drawn only between areas of opposite spray status.
Our spray / no-spray map looks like this:
We add 1 kilometer internal buffers to each border to get the following:
We can then add each household onto our surface to visualize in which areas (spray, no-spray, or buffer) each households falls into.
Full code for the generation of delaunay triangles and voronoi tesselation is available at https://github.com/joebrew/zambezia.
The below map shows all households, except for the 2594 households algorithmically removed due to suspicion of error. The green areas are “no spray”, the red areas are “spray” and the white areas are in buffer.
Each point (household) is clickable; upon click, the household number, village number, core/buffer status and spray status are shown. Note that even households in white areas (buffers) received a spray / no-spray status; this can be viewed by clicking on those houses, or by viewing the “Household table” at the end of this document.
The below table contains all households, along with the relevant information pertaining to spray status, buffer/core status, number of children, etc. This table is interactive (ie, it can be re-ordered, etc.). Note that households not pertaining to a village with a designated spray / no-spray status are automatically marked as “no-spray”.
There are 19952 children with clear, correct data who are eligible for study inclusion. Of these, the split between those assigned to spray and non-spray zones is approximatey 50/50.
The number of children per village can be seen below. Note, this only includes children from households that were not flagged as “problematic” by the geospatial error detection algorithm
To assist with the randomized selection of eligible study participants, we randomize the order of the children in each village and then assign a “random_id” to each child. The “random_id” is a number between 1 and the number of children residing in that village.
The below table has one row for each eligible child. The random_id field should be used for study recruitment. Essentially, children should be recruited beginning at random_id 1 for each village, and going up until the minimum number of participants (18) from each village has been reached.